gNOMO2: a comprehensive and modular pipeline for integrated multi-omics analyses of microbiomes

Abstract Background In recent years, omics technologies have offered an exceptional chance to gain a deeper insight into the structural and functional characteristics of microbial communities. As a result, there is a growing demand for user-friendly, reproducible, and versatile bioinformatic tools that can effectively harness multi-omics data to provide a holistic understanding of microbiomes. Previously, we introduced gNOMO, a bioinformatic pipeline tailored to analyze microbiome multi-omics data in an integrative manner. In response to the evolving demands within the microbiome field and the growing necessity for integrated multi-omics data analysis, we have implemented substantial enhancements to the gNOMO pipeline. Results Here, we present gNOMO2, a comprehensive and modular pipeline that can seamlessly manage various omics combinations, ranging from 2 to 4 distinct omics data types, including 16S ribosomal RNA (rRNA) gene amplicon sequencing, metagenomics, metatranscriptomics, and metaproteomics. Furthermore, gNOMO2 features a specialized module for processing 16S rRNA gene amplicon sequencing data to create a protein database suitable for metaproteomics investigations. Moreover, it incorporates new differential abundance, integration, and visualization approaches, enhancing the toolkit for a more insightful analysis of microbiomes. The functionality of these new features is showcased through the use of 4 microbiome multi-omics datasets encompassing various ecosystems and omics combinations. gNOMO2 not only replicated most of the primary findings from these studies but also offered further valuable perspectives. Conclusions gNOMO2 enables the thorough integration of taxonomic and functional analyses in microbiome multi-omics data, offering novel insights in both host-associated and free-living microbiome research. gNOMO2 is available freely at https://github.com/muzafferarikan/gNOMO2.


Bac kgr ound
Microbiomes play pivotal roles in shaping the environments they inhabit such as influencing host health and disease [ 1 ] and contributing to the ov er all div ersity of life on Earth [ 2 ].The compr ehensive understanding of microbial communities and their impact on human health, ecosystems, and numerous other domains has become an incr easingl y pr ominent field of inv estigation [ 3 ].
Over the past decade, there has been a substantial increase in various omics data types generated from various microbiomes due to the de v elopment of nov el tec hniques and r educed experimental costs [ 4 , 5 ].Hence, the multi-omics approach has emerged as a po w erful strategy to elucidate the functional potential of microbiomes, going beyond taxonomic profiling to decipher the molecular mechanisms [6][7][8].The metabolic pathways, ecological inter actions, and ada ptiv e r esponses of micr obial comm unities can be uncov er ed by integr ating m ultiple omics data [ 9 ].Such a compr ehensiv e perspectiv e is inv aluable for potential implications in diverse fields, such as human health, agriculture, and envir onmental conserv ation.
To unr av el the complex web of interactions within microbiomes and extract meaningful insights from the vast amount of data generated by advanced omics technologies, the development of sophisticated analytical tools and data analysis pipelines is essential [ 10 ].Consequentl y, man y a ppr oac hes and tools have emerged to address these needs [11][12][13][14][15].One such pipeline, gNOMO, facilitates the integr ated m ulti-omics anal ysis encompassing metagenomics (MG), metatranscriptomics (MT), and meta pr oteomics (MP) thr ough the efficient gener ation and use of a proteogenomic database, as well as differential abundance anal ysis-based integr ation at the pathway and taxa levels [ 16 ].Ho w e v er, gNOMO (along with other existing m ulti-omics analysis tools in the microbiome field) currently lacks the capability of processing 16S ribosomal RN A (rRN A) gene amplicon sequencing (AS) data to create a proteogenomic database while it is w ell kno wn that the protein sequence database dir ectl y impacts the outcome of any MP analysis [ 17 ].
For MP, it was shown that unnecessaril y lar ge databases can lead to the exclusion of valid peptide spectrum matches [ 18 ], demanding more time and memory resources .Con versely, smaller databases carry the risk of gener ating false-positiv e r esults that ar e irr ele v ant to the sample.In m ulti-omics-based micr obiome studies that combine MP with MG or MT, protein databases are typicall y gener ated fr om MG and MT data.Ho w e v er, for studies that integrate MP with AS, there is currently no tool available to automatically create a protein database from AS data for MP anal ysis.Gener ating an AS-based protein database can also be valuable for studies that integrate MP, MG, and MT, as sequencing depth limitations may affect the detection of microbes and genes, ther eby influencing MP anal ysis.Additionall y, ther e is a lack of tools for conducting end-to-end integrated analysis of AS data in conjunction with MP results.Most existing multi-omics analysis tools are tailored to specific omics combinations and lack a modular arc hitectur e that can accommodate various omics combinations .Furthermore , there is a shortage of multi-omics analysis tools that incor por ate m ultiple integr ation a ppr oac hes and pr esent r esults at differ ent anal ysis sta ges, facilitating further investigations using other tools.
To addr ess abov ementioned needs of m ulti-omics data anal yses in microbiome resear ch, w e have made significant improvements to the gNOMO pipeline .T hese enhancements encompass the following k e y modifications: (i) We hav e r estructur ed the pipeline, introducing a flexible and modular arc hitectur e that empo w ers gNOMO2 to seamlessly process a wide array of multiomics data derived from microbiomes.With 6 independent modules, gNOMO2 can effortlessl y mana ge a vast spectrum of omics combinations, r anging fr om 2 to 4 distinct omics data types, whic h include AS, MG, MT, and MP.(ii) One of standout features of gNOMO2 is its ability to process AS data and generate a protein database suitable for MP studies.(iii) Additionally, gNOMO2 incor por ates 3 distinct a ppr oac hes for integrated multi-omics anal ysis: pr oteogenomic database-based integr ation, differ ential abundance-based integration (at taxa, functional category, and pathway le v els), and joint visualization-based integr ation.These innov ativ e a ppr oac hes offer a compr ehensiv e perspectiv e on the micr obiomes, enabling r esearc hers to gain deeper insights into the structural and functional properties.gNOMO2 is an open-source tool and fr eel y av ailable at [ 19 ].

Overview of the gNOMO2 pipeline
The gNOMO2 ( RRID:SCR _ 025293 ) pipeline is designed as a tool that relies on Snak emak e ( RRID:SCR _ 003475 ) [ 20 ], a wellestablished bioinformatic w orkflo w management system.This fr ame work guar antees scalable data anal yses and the gener ation of consistent and r epr oducible output data.The pipeline incor por ates a suite of software tools written in various programming languages, including R ( RRID:SCR _ 001905 ), Python ( RRID: SCR _ 008394 ), Shell, and Perl, enabling the seamless execution of m ulti-omics anal ysis steps for micr obiome data.The input data and pr ogr am par ameters in Snak emak e ar e easil y defined thr ough a straightforw ar d configuration file.gNOMO2 streamlines this pr ocess by automaticall y gener ating the configur ation file fr om the provided input data along with default parameters.
To enhance user experience, the pipeline relies on publicly accessible tools distributed as Conda en vironments , simplifying the installation process for individual software components for the end user.gNOMO2 ensures result consistency and makes it userfriendly for individuals with basic bioinformatics skills to analyze multi-omics data.The pipeline accepts raw sequencing files (in fastq.gzformat) for AS, MG, and MT and tandem mass spectrometry (MS/MS) spectrum files (in mgf format) for MP data.
The original gNOMO accepts MG, MT, and MP data as input and generates results for differential abundance analysis in each omics layer.It also constructs a protein database using MG and MT data and performs both differential abundance and pathway-le v el integr ated anal yses (Fig. 1 A).In contrast, the gNOMO2 pipeline comprises 6 modules that facilitate direct analysis of various omics combinations.Each module includes prepr ocessing, anal ysis of eac h omics dataset, data integr ation, and visualization steps (Fig. 1 B).We implemented changes to both the analysis w orkflo w and pipeline structure.For w orkflo w adjustments, we updated the quality contr ol, mer ging, assembl y, differential abundance, and visualization steps.In the quality control phase, we switc hed fr om using PrinSeq ( RRID:SCR _ 005454 ) [ 21 ] to Trimmomatic ( RRID:SCR _ 011848 ) [ 22 ] for cleaning and trimming reads.For read merging, we replaced fastq-join with FLASH2 ( RRID:SCR _ 005531 ) [ 23 ] to merge paired-end reads.In the assembly step, we transitioned from Ray ( RRID:SCR _ 001916 ) [ 24 ] to metaSPAdes ( RRID:SCR _ 000131 ) [ 25 ] for de novo assembly of metagenomic sequences and from Ray to rnaSPAdes ( RRID: SCR _ 016992 ) [ 26 ] for de novo assembly of metatranscriptomic sequences.In the differential abundance analysis step, we replaced LefSe ( RRID:SCR _ 014609 ) [ 27 ] with MaAsLin2 ( RRID:SCR _ 023241 ) [ 28 ].For visualization, we replaced Krona ( RRID:SCR _ 012785 ) [ 29 ] with ggplot2 ( RRID:SCR _ 014601 ) [ 30 ] to analyze taxonomic composition, enabling combined visualization of samples.We also replaced LefSe with MaAsLin2 for visualizing the results of differential abundance analysis.For pathway-le v el anal ysis results, we k e pt Pathview ( RRID:SCR _ 002732 ) [ 31 ] unchanged, but for joint visualization analysis, we used the combi ( RRID: SCR _ 024986 ) [ 32 ] pac ka ge to visualize outputs .T hese w orkflo w changes and comparisons between gNOMO and gNOMO2 are depicted in Supplementary Fig. S1 .
Additionall y, we intr oduced c hanges to facilitate the incor poration of metadata tables into analyses and automated the creation of the configuration file.To enhance and update the structure of the original gNOMO, we implemented 6 modules in the new gNOMO2 pipeline, allowing for the processing of different omics combinations .T he original gNOMO pipeline consisted of only 1 module (Module 5 in gNOMO2), while gNOMO2 introduced 5 more modules for specific combinations, along with the ability to accept AS data as input.

Module 1: Processing AS data and generating a protein database for MP analysis
Module 1 is designed to process raw AS data in both pairedend and single-end formats , pro viding a directly usable protein database for MP data analysis .T he first step in this module involves using Trimmomatic to remove sequencing adapters, lowquality bases from raw reads, and reads that are too short (default minimum length > 25 bp).The quality of both raw and trimmed reads is assessed using FastQC ( RRID: SCR_014583) [ 33 ], and analysis results for all samples are summarized using Mul-tiQC ( RRID: SCR_014982) [ 34 ].If the data are in paired-end format, the quality controlled reads are merged using FLASH2.Subsequently, D AD A2 ( RRID: SCR_023519) [ 35 ] is used in conjunction with the SILVA database ( RRID: SCR_006423) [ 36 ] to obtain an amplicon sequence variant (ASV) abundance table and taxonomy assignments for each ASV.After determining the userdefined top "n" most abundant taxa at a user-defined taxonomic le v el, pr otein sequences of all complete genomes for these taxa are downloaded from the National Center for Biotechnology Information (NCBI) database using the ncbi-genome-download ( RRID: SCR_024977) [ 37 ].All downloaded sequences are merged and cleaned, and a single protein sequence is retained from identical protein sequences to effectiv el y r emov e r edundancy using SeqKit ( RRID: SCR_018926) [ 38 ].Importantly, for host-associated microbiome samples, the user can define host species name in a configuration file.Host protein sequences are then included in the final protein database together with microbial proteins.Module 1 allows r esearc hers to construct a compr ehensiv e pr o- tein database, either from their own AS datasets or publicly a vailable ones .Furthermore , this module creates a phyloseq ( RRID: SCR_013080) [ 39 ] object containing an abundance table, a taxonomy table, and additional metadata.This enables ongoing micr obiome anal ysis using other anal ysis tools.In addition, an abundance plot is automatically generated to assess the abundance distribution of the top "n" taxa as defined by the user.

Module 2: Integr a ted multi-omics analysis of AS and MP data
Module 2 accepts raw paired-end and single-end AS and MP data as inputs .T he AS data under go the pr ocessing steps described in Module 1.The generated AS-based protein database is then used for the database search algorithm MS-GF + ( RRID: SCR_015646) [ 40 ] to identify peptides in the raw MP data.A peptide abundance table is subsequently created by aggregating results from indi vidual samples.Taxonom y and enzyme commission (EC) assignments for the identified peptides are carried out using Pyteomics ( RRID:SCR _ 024988 ) [ 41 ] and Unipept ( RRID:SCR _ 024987 ) [ 42 ].Then, MaAsLin2 is emplo y ed to determine differ entiall y abundant taxa based on both AS and MP data.In this analysis, linear models are emplo y ed to identify taxa that exhibit significant differences in abundance between sample groups at AS and MP le v els while accounting for confounding variables and other factors that might impact the abundance of microbial taxa.Users can define the phenotype of inter est, cov ariates in the Snak emak e configur ation file.Furthermor e, users can specify the normalization or transformation to a ppl y prior to conducting the differential abundance anal ysis.Furthermor e, a joint visualization of MP and AS results is performed using the combi R pac ka ge .T his joint visualization allows to integrate and compare the r esults fr om both types of omics data (taxa for AS and peptides for MP), providing a compr ehensiv e vie w on a single ordination plot and helping r esearchers to identify associations of features from different omics datasets and covariates in metadata table .T he final outputs include abundance tables based on both AS and MP data, detailing the abundance of taxa and peptides in each sample, respectively.Module 1 also generates results from the differential abundance analysis, highlighting the taxa that were significantly different between sample groups based on their AS and MP profiles and the joint visualization analysis results providing a graphical representation of the combined AS and MP features, aiding in the interpretation of the integrated results.

Module 3: Integr a ted multi-omics analysis of MG and MP data
Module 3 is designed to handle raw paired-end MG and MP data.MP data ar e pr ocessed as outlined in Module 2. MG r aw r eads ar e quality c hec ked and cleaned using Trimmomatic, follo w ed by merging with FLASH2.The quality of both raw and trimmed reads is assessed using FastQC, and analysis results for all samples are summarized using MultiQC.Cleaned and merged reads ar e then ma pped to the NCBI nonr edundant (nr) database using Kaiju ( RRID:SCR _ 022775 ) [ 43 ], which generates taxonomic classification results.In parallel, clean reads are also used for assembly with metaSPAdes, and obtained contigs are classified as eukaryotic and prokaryotic using EukRep ( RRID:SCR _ 024985 ) [ 44 ].Proteins within the prokaryotic contigs are predicted using Prodigal ( RRID:SCR _ 011936 ) [ 45 ] while Augustus ( RRID:SCR _ 008417 ) [ 46 ] is used for proteins within eukaryotic contigs .T hen, functional annotation of these predicted proteins is carried out using eggNOG ( RRID:SCR _ 002456 ) [ 47 ] to obtain KEGG ( RRID:SCR _ 012773 ) [ 48 ] Orthology (KO) identifiers, while InterProScan ( RRID:SCR _ 005829 ) [ 49 ] is emplo y ed for TIGRFAMs ( RRID:SCR _ 005493 ) [ 50 ] functional annotation.
Module 3 generates several final outputs for both MG and MP analyses .T hese include taxonomic abundance tables, taxonomic composition plots, and r esults fr om taxa and functional annotation (TIGRFAMs)-based differential abundance analyses .T he module also provides integrated analysis outputs: (i) a joint visualization of omics layers as described in Module 2 and (ii) a pathwayle v el integr ated anal ysis conducted using the P athvie w pac ka ge.The P athvie w plots in this analysis illustrate the log2 ratio of the mean abundance of individual omic features under different userdefined conditions acr oss v arious omics le v els, following a fold change normalization.These log2 ratios are calculated and compar ed using shar ed enzyme and KEGG ids between differ ent omics la yers .Co v er a ges for gene sequences of eac h pr edicted pr otein by MG are calculated using BBMap ( RRID:SCR _ 016965 ) [ 51 ].The calculated r atios ar e visualized on metabolic pathwa y nodes , which are split into omics types (for example, 2 splits for Module 3 for MG and MP data).The color of each split part shows the abundance change in the r ele v ant featur es between sample gr oups for the specific omics le v el, allowing for the visual tr ac king of c hanges in different omics levels on the same node.

Module 4: Integr a ted multi-omics analysis of MG and MT data
Module 4 is designed to handle r aw pair ed-end MG and both paired-end and single-end MT data.MG data follow the processing steps outlined in Module 3.For MT data, a similar w orkflo w is emplo y ed, with the exception that a de novo assembly step is conducted using rnaSPAdes in place of metaSPAdes .T he final outputs of Module 4 include an MG-and MT-based proteogenomic database, taxonomic and functional annotation-based differential abundance analysis results for both omics levels, taxonomic abundance tables and plots, joint visualization of omics layers as described in Module 2, and pathway-le v el integr ated anal ysis r esults as outlined in Module 3.

Module 5: Integr a ted multi-omics anal ysis of MG, MT, and MP data
Module 5 accepts r aw pair ed-end MG, as well as both paired-end and single-end MT and MP data.MG and MT data follow the processing steps outlined in Module 4 while MP data are processed as described in Module 2. The final outputs of Module 5 include a MG-and MT-based proteogenomic database, taxonomic and functional annotation-based differential abundance analysis results for 3 omics le v els, taxonomic abundance tables and plots, peptide abundance table for MP, joint visualization of omics la yers , and pathway-le v el integr ated anal ysis r esults as outlined in Module 3.

Module 6: Integr a ted multi-omics analysis of AS, MG, MT, and MP data
Module 6 accepts both paired-end and single-end AS and MT data, paired-end MG, and MP data.MG, MT, and MP data follow the processing steps outlined in Module 5. Ho w e v er, the final outputs of Module 6 include a proteogenomic database, which is generated by combining AS-, MG-, and MT-based downloaded/predicted protein sequences, taxonomic and functional annotation-based differ ential abundance anal ysis r esults for 4 omics le v els, taxonomic/peptide abundance tables and plots, joint visualization of omics la yers , and pathwa y-le v el integr ated anal ysis r esults as outlined in Module 3.

Analyses
To illustrate the utility of gNOMO2, we r eanal yzed samples fr om 4 pr e viousl y published micr obiome studies involving v arious multi-omics combinations, using the respective publicly available datasets.

Analyzing the association of saliva content with oral cancer
Saliva is a complex biofluid that comprises various components, including DN A, RN A, proteins , metabolites , and microbiota.As a r esult, it is consider ed a pr omising source of r ele v ant biomarkers for a variety of diseases [ 52 ].Granato et al. [ 53 ] combined AS and MP analyses to investigate the association between saliva content and oral cancer.The study suggests that oral microbiota and their pr otein abundance hav e potential dia gnosis and pr ognosis v alue for oral cancer patients .Here , w e sho wcase ho w Modules 1 (AS) and 2 (AS and MP) of gNOMO2 can be used to efficiently reproduce the findings.
The AS data were obtained from NCBI SRA under BioProject identifier PRJNA700849 while MP data were retrieved from PRIDE ( RRID:SCR _ 003411 ) [ 54 ] under accession number PXD022859.The dataset included saliva samples from 8 healthy controls and 15 oral cancer patients.To streamline downstream analyses, we merged triplicates of AS samples and used cell debris MP samples for all analyses .T he taxonomic composition results based on AS data across samples, as generated by gNOMO2, were consistent with the reported results, demonstrating similar abundance distributions and the presence of the same most abundant genera (Fig. 2 A).In their study, Granato et al .[ 53 ] constructed a protein database containing 1,160,275 protein sequences from the 12 most abundant bacterial genera and humans.We applied the same parameters in gNOMO2 to achieve comparable results, with setting such as taxa_level: Genus, top_n: 12 and host: Homo sapiens.gNOMO2 automatically generated a protein database from AS data by determining the 12 most abundant bacterial genera.It then r etrie v ed all pr otein sequences fr om 1,992 genomes belonging these bacterial genera, along with human host proteins, resulting in a total of 1,240,988 protein sequences .T he discrepancy in the number of protein sequences between the generated protein databases may be attributed to variations in analysis timing and database differ ences.Gr anato et al. [ 53 ] used the HOMD, a specific database used for oral microbiome studies while gNOMO2 uses the NCBI database, intended to target all microbiome study types.
Within gNOMO2, users can also perform differential abundance analysis at both omics le v els, yielding statistical test results and plots for differential abundant taxa.For instance, we presented one of differential taxa from AS-based (Fig. 2 B, upper) and MP-based results (Fig. 2 B, lo w er).AS-based differential abundance analysis sho w ed a decrease in the abundance of Veillonella associated with oral cancer (Fig. 2 B, upper), which corresponds to a k e y finding in the Granato et al. [ 53 ] study and previous studies [ 55 ].Inter estingl y, gNOMO2 detected a reduction in the abundance of peptides classified as Homo in oral cancer patients (Fig. 2 B, lo w er) while the original study did not report any statistically significant changes .T his divergence may result from differences in analysis a ppr oac hes, as gNOMO2 employs a pe ptide-based taxonom y by Unipept and MaAslin2 for differential abundance analysis instead of a protein-based approach.Furthermore, it is important to note that we did not account for other covariates that may affect the results.
Finall y, gNOMO2 gener ates a joint visualization plot for AS, MP, and metadata (Fig. 2 C).This plot confirms the association of Veillonella based on AS with oral health status based on AS data and additionall y r e v eals associations between some detected peptides and the oral health status of the participants.Notabl y, InterPr o entries assigned to these peptides included human albumin proteins, whic h wer e pr e viousl y r eported to be associated with or al cancer [ 56 , 57 ].

Exploring potential and acti v e functions within the human gut microbiota
The human gut microbiota is widely recognized for its important roles in both health and disease.A comprehensive understanding of both potential and active features can provide valuable insights into the mechanisms governing various physiological processes and pathologies, ultimately leading to more effective strategies for maintaining and improving human well-being.
Tanca et al. [ 58 ] emplo y ed MG and MP to explore the potential and active functions in the gut microbiota of a healthy human cohort.Here, we used Module 3 (MG and MP) of gNOMO2 to efficientl y r eanal yze the m ulti-omics data fr om their study.The MG data were obtained from the NCBI SRA under BioProject identifier PRJEB19090, while the MP data wer e r etrie v ed fr om PRIDE under accession number PXD005780.The dataset included gut microbiota samples from 6 males and 8 females.
We emplo y ed gNOMO2 to investigate potential differences between male and female participants.Taxonomic composition results based on MG and MP data, as generated by gNOMO2, exhibited a significant ov erla p with the findings of Tanca et al. [ 58 ], particularly concerning the most abundant genera (Fig. 3 A, upper).MG-based differential abundance analysis, using default par ameters, indicated a notabl y higher abundance of Legionella in females.Ne v ertheless, it is important to a ppr oac h this finding with caution, given that Legionella is a bacterial genus typically associated with water and soil en vironments , often considered a potential source of contamination in human microbiome studies [ 59 ].
Functional annotations derived from TIGRFAMs for the differential abundance analysis indicated a reduction in biotin synthesis (Fig. 3 B, lo w er).The joint visualization plot depicted both MG and MP features along with covariates such as body mass index, age, and sex (Fig. 3 C).In our pathway-le v el integr ation analysis, we illustrated the components of pyrimidine metabolism and how variations in their abundance can be observed among study gr oups acr oss differ ent omics le v els (Fig. 3 D).As a case in point, cytidine deaminase (EC 3.5.4.5) displayed a decreased abundance in females at the MG le v el (color ed gr een, left), while its abundance increased at the MP level (colored red, right).This discrepancy suggests a decrease in the abundance of taxa carrying the corresponding gene but a higher expression of the protein.Again, this highlights the significance of adopting a multiomics perspective when drawing conclusions in microbiome studies.

In vestiga ting the role of microbiota of the Maasdam cheese during ripening
The micr obiota pr esent in c heese plays a crucial role in the matur ation and de v elopment of its distinctiv e flavor, making it a pivotal aspect for the cheese industry.Duru et al. [ 60 ] combined MG and MT to tr ac k shifts in both taxonomic compositions and gene expressions of Swiss-type Maasdam cheese microbiota during the ripening pr ocess.Her e, we used Module 4 (MG and MT) of gNOMO2 to efficiently reanalyze multi-omics data from their research.MG and MT data wer e r etrie v ed fr om the EBI ENA under BioProject identifier PRJEB23938.The dataset comprised 3 samples from day 12 and 3 samples from day 37 of the ripening process.
We emplo y ed gNOMO2 to investigate potential differences between differ ent sta ges of ripening pr ocess.Taxonomic composition results generated by gNOMO2 based on MG and MT data sho w ed that Lactococcus , Lactobacillus , and Propionibacterium were 3 most abundant genera across samples (Fig. 4 A), consistent with the findings of Duru et al. [ 60 ].Differ ential abundance anal yses r e v ealed significantl y higher r elativ e abundance of Propionibacterium , the main bacterial genus responsible for propionate metabolism in the Maasdam cheese, in cold ripening samples in both MG and MT le v els (Fig. 4 B), whic h also well aligns with the results of the original study.
The joint visualization plot depicted both MG and MT features along with the ripening types (Fig. 4 C).In our exploration of pathway-le v el integr ation, we depicted the elements of pr opionate metabolism and highlighted how fluctuations in their abundance v aried acr oss study gr oups at MG and MT le v els (Fig. 4 D).Notabl y, genes r elated to pr opionate pr oduction exhibited higher abundance in cold ripening samples (day 37) compared to warm ripening ones (day 12) at the MT le v el (color ed r ed, right), while their le v els wer e not significantl y differ ent at the MG le v el (color ed gray, left).As a result, we did not observe a decrease in expression of genes responsible for propionate production, contrary to findings in the original study.This discrepancy may originate from methodological differences between the studies .T he gNOMO2 pipeline compares the expression of propionate production genes against total gene expression, whereas the Duru et al. [ 60 ] study compared these genes against the overall expression of the Propionibacterium genome obtained in their r esearc h.Consequentl y, the r elativ e expr ession of these genes might a ppear higher when assessed against all genes but lo w er when measured against only Propionibacterium genes.To validate this, we conducted comparisons using the Propionibacterium genome from the original study in the gNOMO2 pipeline for gene expression levels.Changing the denominator from all genes to Propionibacterium genes yielded results consistent with the original study.
Our findings emphasize the critical role of accur atel y inter pr eting analysis outcomes based on the structure of the analytical pipeline.Assuming a default a ppr oac h, particularl y during comparison steps, could lead to unsupported conclusions.In metaomics studies, various approaches can be emplo y ed for data analysis.While none of these a ppr oac hes ar e inher entl y wr ong, they may not align with the goals set by the r esearc h gr oup.When the pipeline's structure is well defined, no inconsistencies in biological conclusions would be expected.Additionally, we stress the impor-tance of clear language in explaining results in r esearc h articles, as failure to do so may mislead readers.In this instance, the discrepanc y w as primaril y due to differ ences between the a ppr oac h depending on comparisons at the individual metagenome assembled genome (MAG) le v el and the gNOMO2 a ppr oac h, whic h compares with the whole community.

Determining microbiome dynamics in a w astew ater treatment plant
Char acterization of micr obial comm unities acr oss v arious metaomics layers offers important insights into their potential traits and functionalities.Herold et al. [ 61 ] utilized MG, MT, MP, and metabolomics to explore the responses of microbial populations in a biological w astew ater treatment plant to disturbances.In our study, we demonstrate how Modules 5 (MG, MT, and MP) and 6 (AS, MG, MT, and MP) of gNOMO2 effectiv el y r eplicate some of their findings using a subset of the samples.
We obtained AS, MG, and MT sequencing data from EBI ENA (BioProject identifier PRJNA230567) and MP data from PRIDE (ac- cession number PXD013655).To investigate seasonal variations reported by Herold et al. [ 61 ], we selected samples showcasing the most distinct differences between summer and winter seasons, encompassing 5 samples fr om eac h.Additionall y, we incor por ated 10 AS samples pr e viousl y collected from the same w astew ater treatment plant by the same research group to assess Module 6.
Our analysis, performed using gNOMO2, r e v ealed taxonomic composition results (AS, MG, MT, and MP data) that partially aligned with the findings by Herold et al. [ 61 ] (Fig. 5 A).Ho w e v er, unlike the original study, we did not observe pronounced compositional changes in winter samples (Fig. 5 A).This discrepancy may be attributed to differing a ppr oac hes in taxonomic composition anal ysis as Her old et al. utilized taxonomic assignments of a subset of metagenome-assembled genomes, while gNOMO2 employs Kaiju for direct taxonomic classification of reads.
While gNOMO2 did not detect differ entiall y abundant taxa between seasons across MG, MT, and MP la yers , our TIGRFAMs and KEGG pathway-based analyses indicated an elevation in fatty acid degradation at the MT level (Fig. 5 B), aligning with the original study.The joint visualization plot highlighted MG, MT, and MP features along with covariates (Fig. 5 C).As a case point, the plot r e v ealed the association of Tetr asphaer a with autumn, which has been reported in previous studies to be associated with sludge bulking that fr equentl y occurs in w astew ater treatment plants [ 62 , 63 ].
In our pathway-le v el integr ation anal ysis (Fig. 5 D), we illustr ated v ariations in the components of fatty acid degradation and gl ycer olipid metabolism among study gr oups acr oss differ ent omics le v els.Specificall y, gNOMO2 showcased an incr ease in fatty acid degradation at the MT level while detecting an elevation in gl ycer olipid metabolism at both MT and MP le v els, as indicated and discussed in detail in the original study.
When AS data wer e integr ated using Module 6, gNOMO2 constructed a proteogenomic database comprising 4,959,677 proteins, incor por ating 859,729 nonr edundant pr oteins deriv ed fr om the top 10 most abundant genera identified in the AS analysis, in addition to the 4,025,111 proteins obtained from MG and MT analyses.Inter estingl y, this integr ation r esulted in a slight decr ease in the number of detected unique peptides ( ∼2%), indicating the importance of database size optimization in the multi-omics studies, including MP.The inclusion of AS data did not alter the other outcomes derived from the MP data analysis.
Our findings highlight that read-based and MAG-based taxonomic composition analysis approaches can lead to div er gent results and interpretations.Since neither approach is inherentl y wr ong, this disparity underscor es the significance and adv anta ge of thor oughl y examining meta-omics datasets using various methodologies .Hence , we underscore that employing div erse a ppr oac hes and perspectiv es in complex m ulti-omics datasets may r e v eal nov el insights extending beyond the original hypothesis.

Discussion
gNOMO2 stands as a versatile and modular bioinformatic pipeline designed for integrated multi-omics analyses of AS, MG, MT, and MP data in a r epr oducible fashion.Our open-source tool efficiently employs techniques that process raw data and generates summary tables and figures with just a single, straightforw ar d command.gNOMO2 encompasses pr epr ocessing, genome ma pping, assembl y, pr otein pr edictions, taxonomic and functional annotations, pr oteogenomic database gener ation, and differ ential abundance analysis steps for each omics la yer.Furthermore , gNOMO2 offers a holistic perspective through integrated visualization of omics layers and facilitates pathway-le v el integr ativ e anal ysis.In addition, it includes a dedicated module for AS data processing and the automatic protein database generation for MP studies.gNOMO2 generates results that can serve as inputs for subsequent micr obiome anal yses using v arious bioinformatics tools, enhancing user flexibility throughout the pr ocess.Demonstr ated efficacy of gNOMO2 with real datasets underscores it as an invaluable tool across various multi-omics combinations in microbiome r esearc h.Finall y, the emphasis on r epr oducibility is a cornerstone of gNOMO2, as it not onl y str eamlines the anal ytical pr ocess but also ensures the reliability of results by providing users with fully documented and executable w orkflo ws, enhancing the tr anspar ency and r e plicability in omics-dri v en micr obiome r esearch.
Despite its usefulness and effectiveness in multi-omics based micr obiome r esearc h, gNOMO2 still has certain limitations.First, its performance may be influenced by the quality and depth of input data, thereby necessitating potential parameter optimizations by the user.Second, gNOMO2 relies on existing databases for taxonomic and functional annotations , which ma y restrict the detection of features not cataloged within these databases.Mor eov er, gNOMO2's efficacy may also be influenced by the complexity of microbial communities, particularly in cases of high diversity or rare taxa, where accurate profiling may be challenging.Lastly, users should be aware that gNOMO2 assumes a certain le v el of computational pr oficiency, and while efforts hav e been made to enhance user-friendliness, beginners may still face a learning curve because there is no graphical user interface provided.
Futur e v ersions of gNOMO2 could addr ess these limitations through continuous updates , impro ved algorithmic approaches, and increased flexibility in handling diverse omics types , datasets , and microbial community structures.

Figure 1 :
Figure 1: Ov ervie w of gNOMO and gNOMO2 pipelines.(A) gNOMO accepts MG, MT, and MP data as input, pr oviding differ ential abundance anal ysis r esults for eac h omics layer.It also generates a protein database using MG and MT data and performs a pathway-le v el integr ated anal ysis.(B) gNOMO2 comprises 6 modules, each tailored for specific omics data.Module 1 accepts 16S rRNA gene amplicon sequencing data (AS) as input and generates a protein database suitable for metaproteomics studies, a taxa abundance plot, and a phyloseq object that can be used for downstream analysis in other microbiome tools.Modules 2 to 6 handle different combinations of AS, MG, MT, and MP data, creating omics-specific protein databases, abundance tables , plots , differential abundance analysis results, and pathway-le v el integr ation anal ysis r esults.

Figure 2 :
Figure 2: Ov ervie w of gNOMO2 r esults for the Gr anato et al. [ 53 ] study.(A) Repr esentation of the 10 most pr e v alent gener a in saliv a micr obiota samples.AS-based r epr esentations of saliv ary micr obiota composition acr oss samples, highlighting the 10 most common bacterial gener a. Eac h bar indicates the r elativ e abundance distribution for a sample.(B) Abundance distribution of differ entiall y abundant taxa across study gr oups, pr esented separ atel y for AS (upper) and MP (lo w er) data.(C) Joint visualization-based integration results for AS, MP, and metadata.Blue labels r epr esent taxa, green labels represent peptides, and black labels represent metadata columns.Patient samples are marked with blue dots, while healthy samples are marked with red dots.

Figure 3 :
Figure 3: Ov ervie w of gNOMO2 r esults for the Tanca et al. [ 58 ] study.(A) Repr esentation of the 10 most pr e v alent gener a in gut micr obiota samples, as sho wn b y MG and MP.The left side illustr ates the 10 most common bacterial gener a based on MG data, while the right side r epr esents MP-based findings.Each bar represents relative abundance distribution for a sample.(B) Abundance distribution of differentially abundant taxa across study gr oups, separ atel y for MG (upper) and MP (lo w er) data.(C) Joint visualization-based integration results for MG, MP, and metadata.Blue labels r epr esent taxa, green labels show peptides, and black labels represent metadata columns.Male samples are marked with blue dots, while female samples are marked with red dots.(D) Pathway-level integration results, demonstrating the relationship across different omics levels .T he findings from MG and MP are illustrated separately as split nodes on the left and right, respectively.

Figure 4 :
Figure 4: Ov ervie w of gNOMO2 r esults for the Duru et al. [ 60 ] study.(A) Repr esentation of the 10 most common gener a in c heese micr obiota samples.MG-and MT-based ov ervie w of gut microbiota composition across samples .T he 10 most common bacterial genera in cheese microbiota samples are shown for MG (left) and MT (right).Each bar represents relative abundance distribution for a sample.(B) Abundance distribution of differentially abundant taxa across study groups by MG (upper) and MT (lower).(C) Joint visualization-based integration results for MG, MT, and metadata.(D) P athway-le v el integr ation r esults, demonstr ating the r elationship acr oss differ ent omics le v els .T he findings fr om MG and MT ar e illustr ated separ atel y as split nodes on the left and right, r espectiv el y.

Figure 5 :
Figure 5: Ov ervie w of gNOMO2 r esults for the Her old et al. (2020) study.(A) Repr esentation of the 10 most common gener a in waste water micr obiota samples.MG-, MT-, and MP-based ov ervie w of gut microbiota composition across samples .T he 10 most common bacterial genera in w astew ater microbiota samples by MG (left), MT (middle), and MP (right).Each bar represents relative abundance distribution for a sample.(B) Abundance distribution of differ entiall y abundant taxa across study groups by MG (upper), MT (middle), and MP (lo w er).(C) Joint visualization-based integration results for MG, MT, MP, and metadata.(D) P athway-le v el integr ation r esults, demonstr ating the r elationship acr oss differ ent omics le v els .T he findings from MG, MT, and MP ar e illustr ated separ atel y as split nodes on the left, middle, and right, r espectiv el y.